Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

Enhancing Document VQA Models via Retrieval-Augmented Generation

Add code
Aug 28, 2025
Viaarxiv icon

Few-Shot Connectivity-Aware Text Line Segmentation in Historical Documents

Add code
Aug 26, 2025
Viaarxiv icon

Exposing Privacy Risks in Graph Retrieval-Augmented Generation

Add code
Aug 24, 2025
Viaarxiv icon

Named Entity Recognition of Historical Text via Large Language Model

Add code
Aug 25, 2025
Viaarxiv icon

mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering

Add code
Aug 07, 2025
Viaarxiv icon

GHTM: A Graph based Hybrid Topic Modeling Approach in Low-Resource Bengali Language

Add code
Aug 01, 2025
Viaarxiv icon

Combining Language and Topic Models for Hierarchical Text Classification

Add code
Jul 22, 2025
Viaarxiv icon

Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents

Add code
Jul 23, 2025
Viaarxiv icon

Exploring text-to-image generation for historical document image retrieval

Add code
Jul 28, 2025
Viaarxiv icon

From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation

Add code
Jul 15, 2025
Viaarxiv icon